I used maximum frequency as a method to compare ESM log likelihood scores to a Nextstrain tree.
In my slides I used these figures to describe maximum frequency:
I had them on
separate slides which made it less clear they were connected.
Here’s an image from one of Trevor’s slides that highlights maximum
frequency more clearly:
At 2011, for 53N, some nodes on the tree would have a maximum frequency of 0.5.
Over time 53N is replace by a more dominate lineage, 278K that eventually reaches 100% maximum frequency, however for earlier nodes of 278K on the tree (2012-2014) maximum frequency is between 0 and 1.
Frequencies for terminal node (stored in a JSON file) for A/Norway/1824/2015:
{
0.003671,
0.016207,
6e-06,
}Shown in the JSON file are just frequencies for this node above 0% (there are zeros above and below these points indicating no samples were collected at that point in time).
If we wanted to get the maximum frequency for this node, we would select the highest frequency of all sampled strains for this node: 0.016207.
We can visualize these points on a Nextstrain tree if we turn off
normalization for frequency:
If we normalize our frequencies in Nextstrain, clade 3C.3b would have a frequency of 100%, but we want to look at the frequency of this internal node across all other clades during this time period.
Nextstrain frequency normalization does not show us this:
Like before we return to our JSON files including all children for the internal node:
Frequencies for terminal node: A/Norway/1824/2015:
{
0.003671,
0.016207,
6e-06,
}Frequencies for terminal node: A/SouthDakota/11/2015:
{
0.003443,
0.016774,
6e-06,
}To get maximum frequency of internal node we add frequencies in the same time period:
{
0.003671 + 0.003443 = 0.007114
0.016207 + 0.016774 = 0.032981
6e-06 + 6e-06 = 0.000012
}
And choose the highest value:
{
0.007114,
0.032981,
0.000012,
}Maximum frequency is 0.032981 or 3% for the internal node.
This is viewable in Nextstrain with frequency normalization turned off:
Frequency normalization off: In 2015 the internal node varied in frequency but
reached a peak of 3%.
Using maximum frequency we can assume that a node with a high maximum frequency is more fit than a node with a lower maximum frequency at that given point in time. The “trunk” of our tree would be connected nodes over time that reach 100% maximum frequency.
The further we zoom out and the closer we get to the trunk of the tree we will see some nodes that reach 100% maximum frequency.
Clade 3C.3b is never the only circulating variant, and no nodes ever
reach 100% frequency, reaching only a maximum frequency of 9% for some
nodes at 2015.
Maximum frequency was used as a comparison against deep mutational
scanning, for Flu HA in this
paper.